perm filename OVERV[DIS,DBL]1  blob 
sn#209291 filedate 1976-04-05 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00009 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.NSECP(Overview)
C00003 00003	.SSEC(Three-page Summary of the Project)
C00023 00004	. SSSEC(1-sentence summary of the thesis)
C00024 00005	.SSEC(Motivation: why is this worthwhile?)
C00026 00006	.SSEC(Fifteen-page Summary of the entire project)
C00027 00007	. SSSEC(HISTORIAN'S SUMMARY of AM's researches)
C00033 00008	.SSEC(Guide to reading the remainder of the thesis)
C00034 00009	. SSSEC(VARIED READERSHIP of this thesis)
C00039 ENDMK
C⊗;
.NSECP(Overview)
.SSEC(Three-page Summary of the Project)
Scientists  often face  the difficult  task  of formulating  research
problems which  must be soluble yet nontrivial.   In any given branch
of science, it is usually  easier to tackle a specific given  problem
than  to   propose  interesting   yet  managable  new   questions  to
investigate.   For example, contrast ⊗4solving⊗* the Missionaries and
Cannibals problem with  the more ill-defined  reasoning which led  to
⊗4inventing⊗* it.
This   thesis  is  concerned   with  creative  theory   formation  in
mathematics: how to  propose interesting new  concepts and  plausible
hypotheses connecting them.  The experimental  vehicle of my research
is   a  computer   program  called  ⊗2AM⊗*   (for  ⊗2↓_A_↓⊗*pprentice
⊗2↓_M_↓⊗*athematician), which  carries  out some  of  the  activities
involved in  mathematical research: noticing simple  relationships in
empirical  data, formulating  new definitions  out of  existing ones,
proposing some  plausible conjectures, and  evaluating the  aesthetic
"interestingness" of new concepts.
Before  discussing  how  to  ⊗4synthesize⊗*  a new  theory,  consider
briefly how to ⊗4analyze⊗* one, how to construct a plausible chain of
reasoning which terminates in a given discovery.   One can do this by
working  backwards,  by  reducing the  creative  act  to simpler  and
simpler creative acts.   For example, consider  the concept of  prime
numbers.  How  might one be led  to define such a  notion? Notice the
following plausible strategy:
.ONCE INDENT 9,9,9 SELECT 6
"If  f is a function which transforms  elements of A into elements of
B, and B is ordered, then consider just those members  of A which are
transformed  into  ⊗4extremal⊗*  elements  of  B.   This  set  is  an
interesting subset of A."
When f(x) means "factors of x", and the ordering is "by length", this
heuristic says to consider  those numbers which have a  minimal$$ The
other  extreme, numbers with  a MAXIMAL  number of factors,  was also
proposed  by AM  as  worth  investigating.    This  led  AM  to  many
interesting questions; the  only "new-to-Mankind" mathematical result
so   far   is  that   numbers   of   the  following   form   are  all
maximally-divisible:   p⊗B1⊗*⊗Aa1⊗*   p⊗B2⊗*⊗Aa2⊗*    p⊗B3⊗*⊗Aa3⊗*...
p⊗Bk⊗*⊗Aak⊗*, where the p⊗Bi⊗*'s  are the first k consecutive primes,
and  the  exponents  a⊗Bi⊗*  decrease  with  i,  and  the  ratio   of
(a⊗Bi⊗*+1)/(a⊗Bj⊗*+1) is approximately (as closely  as is possibe for
integers)   log(p⊗Bj⊗*)/log(p⊗Bi⊗*).      For   example,  a   typical
divisor-rich                        number                         is
n=2⊗A8⊗*3⊗A5⊗*5⊗A3⊗*7⊗A2⊗*11⊗A2⊗*13⊗A1⊗*17⊗A1⊗*19⊗A1⊗*23⊗A1⊗*29⊗A1⊗*31⊗A1⊗*37⊗A1⊗*41⊗A1⊗*43⊗A1⊗*47⊗A1⊗*53⊗A1⊗*.
The progression of  its exponents+1 (9 6 4 3 3 2 2 2 2 2 2 2 2 2 2 2)
is about  as  close as  one  can get  to satisfying  the  "logarithm"
constraint.    This  number  n  has  3,981,312  divisors.    The  "AM
Conjecture" is that no number smaller than n has that many  divisors.
By  the way,  this  n  equals  25,608,675,584.   The  empirical  data
gathered  indicated the general character  of the declining exponents
to AM,  but the  precise  formulation of  the "logs"  constraint  was
beyond AM,  since it doesn't know  about logs. In fact,  this was the
warped  way in which AM first defined a  concept like the one we call
"logarithm". The proof of  the conjecture involves calculus,  and for
that reason  was also  far beyond the  abilities of AM.  This example
shows  the   usefulness   of  a   mechanical   "co-researcher",   the
effectiveness of  AM--human cooperation.$ number  of factors  -- that
is,  the primes.  So  this rule  actually ⊗4reduces⊗*  our  task from
"proposing the  concept  of prime  numbers"  to the  more  elementary
problems of "discovering cardinality" and "inventing factorization".
But suppose  we know  this general  rule: ⊗6"If  f is  an interesting
function, consider its inverse."⊗* It reduces the task of discovering
factorization to  the  simpler  task of  discovering  multiplication.
Eventually, this task reduces to the discovery of very basic notions,
like substitution, set-union, and equality.   To explain how a  given
researcher might  have made a  given discovery,  such an analysis  is
continued  until  that  inductive task  is  reduced  to "discovering"
notions which the researcher already knew.
Suppose a  large collection of  these heuristics  has been  assembled
(e.g., by  analyzing a great  many discoveries, and writing  down new
heuristic  rules  whenever  necessary).   Instead  of  using  them to
⊗4explain⊗* how  a given  idea might  have evolved,  one can  imagine
starting from a basic  core of knowledge and "running" the heuristics
to ⊗4generate⊗* new concepts.
Such syntheses are precisely what AM does.  The program consists of a
corpus of primitive mathematical concepts and a collection of guiding
heuristics (currently a couple hundred of each).  AM's activities all
serve  to expand  AM  itself$$  Incidentally,  these  basic  concepts
include   the    operators   which    enlarge   the    space   (e.g.,
COMPOSE-2-RELATIONS  is both a concept in its  own right and a way to
generate new ones).  An important kind of heuristic knowledge that AM
possesses  is how  to  create new  heuristics to  associate  with new
concepts  it  defines.  So  AM  can  create  new  concepts  and   new
heuristics.    $,  to enlarge  upon  a  given  body  of  mathematical
knowledge.  To cope with the enormity of the potential "search space"
involved, AM  uses its  heuristics as  judgmental  criteria to  guide
development in  the most  promising direction.   It appears  that the
process  of inventing worthwhile new$$ Typically,  "new" means new to
AM, not to Mankind, and "worthwhile" can only be judged in hindsight.
$ concepts  can be  guided successfully using  a collection of  a few
hundred such heuristics.
Each concept is represented as a ⊗6BEING⊗* [Lenat], a frame-like data
structure with  30 different  facets or  slots. The  types of  facets
include:   ⊗6Examples,  Definitions,  Generalizations,  Domain/Range,
Analogies,  Interestingness,⊗*  and  a  couple  dozen  others.    The
⊗6BEINGs⊗* representation provides a convenient scheme for organizing
the  heuristics; for  example, the  following strategy fits  into the
⊗4Examples⊗* facet of the ⊗4Predicate⊗* concept: "If, empirically, 10
times as many elements  ⊗4fail⊗* some predicate P, as ⊗4satisfy⊗* it,
then some ⊗4generalization⊗*  (weakened version) of  P might be  more
interesting than P".   AM considers  this suggestion after  trying to
fill  in examples of  any predicate$$ In  fact, after  AM attempts to
find examples of SET-EQUALITY,  so few are  found that AM decides  to
generalize that predicate.   The result is the  predicate which means
"Has-the-same-length-as" -- i.e., the rudiments of Cardinality.  $.
AM is  initially given a large collection of core concepts, with only
a few facets filled in for each.  Its sole activity is to choose some
facet  of some  concept, and  fill in  that particular  slot.   In so
doing, new  notions  will  often  emerge.    Uninteresting  ones  are
forgotten, mildly interesting ones are kept as  parts of one facet of
one  concept,  and very  interesting  ones are  granted  full concept
status. Such new  ⊗6Beings⊗* have  dozens of blank  parts, hence  the
space of  possible actions  (blank slots to  fill in)  grows rapidly.
The  same  heuristics are  used both  to  suggest new  directions for
investigation, and to limit attention: both to grow and to prune.
The particular mathematical  domains in which  AM operates depend  on
the  choice of  initial concepts.  Currently, AM  begins  with scanty
knowledge of a couple hundred concepts which Piaget might describe as
⊗4prenumerical⊗*: Sets, substitution, relations, equality, and so on.
In  particular, AM  is not told  anything about  proof, single-valued
functions, or  numbers.    With  this basis,  AM  quickly  discovered
elementary numerical concepts (corresponding to  those we refer to as
cardinality, multiplication, factors, and primes) and wandered around
in the domain  of elementary  number theory.   Although it was  never
able to  ⊗4prove⊗* the unique factorization theorem,  AM actually did
⊗4conjecture⊗* it$$  Due to  the firm  base of  preliminary  concepts
which AM developed, this relationship was almost  obvious.  AM sought
some  predicate P  which, for  each n,  some member  of FACTORS-OF(n)
satisfied.  ALL-PRIMES was such a predicate.  AM next constructed the
relation which associates, to each  number n, all factorizations of n
into  primes.   The full  statement of  the UFT  is simply  that this
relation is a function; i.e., it is defined and single-valued for all
numbers n. $.   "Discovering" a concept means  that (1) AM recognized
it as a  distinguished entity (e.g.,  by formulating its  definition)
and also (2) AM decided it was worth investigating (either because of
the  interesting  way   it  was  formed,  or  because  of  surprising
preliminary empirical results).
AM was not able  to discover any "new-to-Mankind" mathematics  purely
on its own, but ⊗4has⊗* done  so when working as a co-researcher with
a  human⊗A1⊗*.  AM  noticed simple new  concepts which mathematicians
had overlooked, but AM by itself was not able  to precisely formulate
and prove  interesting statements about  those concepts.   I conclude
that a synergetic AM--human combination can sometimes produce  better
research than either could alone.
Everything that AM does can be viewed  as testing the underlying body
of  heuristics. Gradually,  this knowledge becomes  better organized,
its implications clearer.  The resultant body of  detailed heuristics
may be  the germ  of a  more efficient  programme for  educating math
students  than the current dogma$$ Currently,  the educator takes the
very best work any mathematician has ever done, polishes it until its
brilliance  is blinding,  and presents  it to  the student  to induce
upon. Many individuals (e.g., Knuth and Polya) have pointed out  this
blunder.    A few  (e.g.,  Papert  at  MIT, Adams  at  Stanford)  are
experimenting   with   more  realistic   strategies   for  "teaching"
creativity.  $.
Another exciting prospect opened up by AM is that of experimentation:
one  can vary  the  concepts  AM starts  with,  vary the  heurisitics
available,  etc.,  and  study  the effects  on  AM's  behavior.   One
experiment involved substituting  concepts in an entirely  new domain
(plane geometry)$$ A new -- but bizarre -- result was obtained there:
any angle (between  0 and 180↑o)  can be approximated  to within  one
degree, as the sum of two angles drawn from  the set α{0↑o, 1↑o, 2↑o,
3↑o,  5↑o, 7↑o, 11↑o,..., 179↑oα},  i.e., the set of  angles of prime
size.  If our culture and technology were different, this  might have
been an important well-known result. $.
One difficulty facing  AM is the necessity to judge  ⊗4a priori⊗* the
value of each new concept, to quickly lose interest in concepts which
aren't going  to develop into  anything.   Often, such judgments  can
only be  based on hindsight.  For similar reasons,  AM has difficulty
formulating new heuristics which are relevant to the new concepts  it
creates. Heuristics are often merely compiled  hindsight.  While AM's
"approach"  to empirical  research may  be  used in  other scientific
domains, the main  limitation (reliance  on hindsight) will  probably
recur.
. SSSEC(1-sentence summary of the thesis)
"Creative scientific discovery (at least in elementary mathematics)
 can be successfully represented as a heuristic search process"
.SSEC(Motivation: why is this worthwhile?)
.B
	Inherent interest of getting a handle on the task (sci. creativity)
		Personal belief that discovery can be (ought to be) demystified
		Potential for learning, from the system, more about the process 
			of sci. concept formation, thy. formation, chance discovery
			(do experiments on the implementations: eg, vary AM's heurs)
	Potential usefulness of the implementations themselves (including AM)
		Aids to research; i.e., ultimately: new discoveries.
		Potential to education: like Mycin, extract heurs. and teach them
	All the usual bad reasons:
		"Look ma, no hands" + maternal drives + ego + thesis drives +... 
	Historical: 
		Need task with no specific goal, to test BEINGs ideas.
		Disenchantment with theorem-provers that plod along, in contrast
			to the processes which my model of math demands: intu, need,
	                aesth., multiple reprs, proposing vs proving, fixed task.
	
.E
.SSEC(Fifteen-page Summary of the entire project)
<<first pass: condense the 20-page text of the 1-hour "whirlwind tour" talk>
<<Or: condense each section of this thesis; pull out all the overviews>
<potential organization: mirror the overall organization of the thesis itself>
. SSSEC(HISTORIAN'S SUMMARY of AM's researches)
Before diving into the  depths of AM, let's take a  moment to discuss
the  totality  of the  mathematics  which  AM carried  out.    Like a
contemporary historian summarizing the  work of Euclid, we shall  not
hesitate to use current terms, and criticize by current standards.
AM  began  its   investigations  with  scanty  knowledge   of  a  few
set-theoretic  concepts  (sets, equaility  of sets,  set operations).
Most of  the obvious  set-theory relations (e.g.,  de Morgan's  laws)
were eventually  uncovered; since AM never  fully understood abstract
algebra, the statement and  verification of each  of these was  quite
obscure.   AM  never derived  a  formal notion  of  infinity, but  it
naively established conjectures  like "a set can never be a member of
itself", and procedures for making chains of new sets ("insert  a set
into  itself").    No   sophisticated  set  theory  (diagonalization,
ordinals) was ever done.
After this  initial period of exploration, AM decided that "equality"
was  worth   generalizing,  and  thereby   discovered  the   relation
"same-size-as".  Cardinality was  based on this, and soon most simple
arithmetic operations were defined. Since addition arose as an analog
to union, and multiplication  as an analog to cross-product,  it came
as quite  a surprise when AM noticed  that they were related (namely,
N+N=2xN). Soon  after defining  multiplication, AM  investigated  the
process of multiplying a number by  itself: squaring.  The inverse of
this  turned out  to  be interesting,  and led  to the  definition of
square-root.
Although AM was very close to discovering irrationals  at this point,
it  turned aside and  was content  to work  with integer-square-root.
Raising to fourth-powers, and fourth-rooting, were discovered at this
time.
.ONCE TURN ON "{}"
The associativity  and symmetry of  multiplication indicated  that it
could accept a  BAG (a multiset) of numbers as its argument. This led
to the  notion of  factoring a  number. Minimally-factorable  numbers
turned out to  be what we call primes.   Maximally-factorable numbers
were  also thought  to be  interesting at  the time,  and in  fact an
unusual $$ 
These are the so-called "highly-composite" numbers of Ramanujan.
As far as  the author and his committee know, this  is the
first  such  explicit characterization  of  these  numbers, hence  is
probably "new-to-Mankind".
A similar (but slightly  different) result has recently been noticed in
[Hardy] (p. 93).
Since the purpose of the thesis is not to
derive  "new"  mathematics,   discussion  of  this  result   will  be
minimized in this document. A short discussion will be provided in 
Section {[2] MAXDIVSEC}.{[1] MAXDIVSSEC}, on page {[3] MAXDIVPAGE}.
$ characterization of such numbers was discovered.
AM  conjectured  the  fundamental   theorem  of  arithmetic   (unique
factorization  into primes)  and  Goldbach's  conjecture (every  even
number >2  is the sum of two primes) in a surprisingly symmetric way.
The unary representation of numbers gave way to a representation as a
bag of  primes (based on unique factorization),  but AM never thought
of exponential  notation.   Since  the key  concepts of  modulus  and
exponentiation were never  discovered, progress in number  theory was
arrested.
.SSEC(Guide to reading the remainder of the thesis)
.B
	i) Overall organization of the thesis
	ii) Plans for what to read (and in what order), depending on your interests
		Plan for those interested in the AI ideas
		Plan for those interested in the systems ideas
		Plan for those interested in mathematics
	iii) Pre-requisites and how to satisfy them, for each chapter
		For those with little pure mathematics in their background
		For those with little computer science background
		For computer scientists with little contact to AI before
	    <either organized by "type" of reader, or by chapter/section>
.E
. SSSEC(VARIED READERSHIP of this thesis)
⊗7¬(A∨C∨M)⊗*
This thesis  -- and its  readers --  must come to  grips with a  very
interdisciplinary  problem.   For the reader  whose background  is in
Artificial  Intelligence,  most  of  the  system's  actions   --  the
"mathematics" it does  -- may seem inherently  uninteresting. For the
mathematician,  the  word "LISP"  signifies  nothing beyond  a speech
impediment  (to Artificial  Intelligence  types  it also  connotes  a
programming impediment). If I  don't describe "LISP" the first time I
mention it, a large fraction of potential readers will never  realize
that potential. If I ⊗4do⊗* stop to  describe LISP, the other readers
will  be bored.   The standard solutions  are to  sacrifice one fixed
community in favor of the other, or to be entertaining enough to keep
both of them around. In this work, a third alternative will be taken.
Sections  will be  tagged with  descriptors like  "⊗7M⊗*" (indicating
that the section will be of interest to those who enjoy mathematics),
or "⊗7¬A⊗*" (the  section will be a waste of  time for those familiar
with Artificial Intelligence). The labels will consist of the letters
⊗7A⊗* (for  ↓_A_↓I),  ⊗7M⊗* (for  ↓_M_↓ath), and  ⊗7C⊗* (for  general
↓_C_↓omputer  science), connected by standard  logical symbols: ⊗7¬⊗*
(negation), ⊗7∧⊗* (and), ⊗7∨⊗* (or).  For example, since  the current
paragraph is  labelled ⊗7¬(A∨C∨M)⊗*, the  reader can assume it  is of
little interest to anyone.
In addition, there are two glossaries of terms. Appendix 1a contains
capsule descriptions of about 100 mathemtical terms, ideas, notations,
and jokes. Appendix 1b renders the analogous service for Artificial
Intelligence jargon and computer science concepts.